Skip to content

Add ARJ, ARC, and ACE archive decompression support#194

Draft
Copilot wants to merge 8 commits intomainfrom
copilot/reimplement-arj-arc-changes
Draft

Add ARJ, ARC, and ACE archive decompression support#194
Copilot wants to merge 8 commits intomainfrom
copilot/reimplement-arj-arc-changes

Conversation

Copy link
Contributor

Copilot AI commented Feb 6, 2026

  • Update agent instructions to note NBGV deep clone requirement
  • Bump SharpCompress 0.40.0 → 0.44.5 (supports ARJ, ARC, ACE)
  • Add ARJ, ARC, ACE to ArchiveFileType enum in MiniMagic.cs
  • Add magic byte detection for ARJ (0x60, 0xEA), ARC (0x1A + method), ACE (ACE at offset 7)
  • Create ArjExtractor.cs, ArcExtractor.cs, and AceExtractor.cs following existing extractor patterns
  • Register new extractors in SetDefaultExtractors()
  • Create test data files (TestData.arj, TestData.arc, TestData.ace)
  • Add test entries in MiniMagicTests and ExpectedNumFilesTests
  • Add CopyToOutputDirectory entries in test .csproj
  • Build succeeds across all target frameworks
  • All new tests pass (MiniMagic detection + extraction for all 3 formats)
  • Restore nuget.config to original configuration
  • Code review passed (ACE governor check moved before decompression)
  • Fix DetectFileType regression: read min(length, buffer) bytes, gate each signature check on bytesRead
  • Fix encrypted 7z stream position regression from SharpCompress upgrade
  • Add FailedArchive status assertions to count-only test methods
  • Security: No new vulnerabilities in SharpCompress 0.44.5

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits February 6, 2026 18:23
… clone requirement

Co-authored-by: gfs <98900+gfs@users.noreply.github.com>
Co-authored-by: gfs <98900+gfs@users.noreply.github.com>
Co-authored-by: gfs <98900+gfs@users.noreply.github.com>
Copilot AI changed the title [WIP] Reimplement arj and arc changes with cleaner commit history Add ARJ, ARC, and ACE archive decompression support Feb 6, 2026
Copilot AI requested a review from gfs February 6, 2026 18:37
@gfs
Copy link
Contributor

gfs commented Feb 6, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds native extraction support for ARJ, ARC, and ACE archives (via SharpCompress) to RecursiveExtractor, including MiniMagic type detection and test coverage to validate detection + extraction output counts.

Changes:

  • Upgraded SharpCompress to enable ARJ/ARC/ACE reader support.
  • Added MiniMagic signatures and new ArchiveFileType enum values for ARJ/ARC/ACE, plus registered new extractors.
  • Added small fixture archives and expanded unit tests to cover detection and extracted-file counts.

Reviewed changes

Copilot reviewed 10 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
RecursiveExtractor/RecursiveExtractor.csproj Bumps SharpCompress dependency to a version that supports ARJ/ARC/ACE.
RecursiveExtractor/MiniMagic.cs Adds ARJ/ARC/ACE types and signature detection; increases header read size.
RecursiveExtractor/Extractors/ArjExtractor.cs New ARJ extractor implementation (sync + async).
RecursiveExtractor/Extractors/ArcExtractor.cs New ARC extractor implementation (sync + async), with post-decompression size check.
RecursiveExtractor/Extractors/AceExtractor.cs New ACE extractor implementation (sync + async).
RecursiveExtractor/Extractor.cs Registers ARJ/ARC/ACE extractors in SetDefaultExtractors().
RecursiveExtractor.Tests/TestData/TestDataArchives/TestData.arj Adds ARJ test fixture archive.
RecursiveExtractor.Tests/TestData/TestDataArchives/TestData.arc Adds ARC test fixture archive.
RecursiveExtractor.Tests/TestData/TestDataArchives/TestData.ace Adds ACE test fixture archive.
RecursiveExtractor.Tests/RecursiveExtractor.Tests.csproj Ensures new fixture archives are copied to test output.
RecursiveExtractor.Tests/ExtractorTests/MiniMagicTests.cs Adds MiniMagic detection tests for ARJ/ARC/ACE fixtures.
RecursiveExtractor.Tests/ExtractorTests/ExpectedNumFilesTests.cs Adds extraction count expectations for ARJ/ARC/ACE fixtures.
.github/copilot-instructions.md Documents deep-clone requirement for Nerdbank.GitVersioning builds.

@gfs
Copy link
Contributor

gfs commented Feb 6, 2026

@copilot it doesn't intuitive eem like it should be related - but with this change i'm seeing the tests for TestDataEncrypted.7z fail with 0 results found on both the lib and cli tests. Is there perhaps an issue with the changes to minimagic that are causing an incorrect header detection on that file is one theory i have.

…e each signature check on bytes actually read

Co-authored-by: gfs <98900+gfs@users.noreply.github.com>
SharpCompress 0.44.5 now throws CryptographicException at Open() time
for encrypted 7z archives, leaving the stream at a non-zero position.
Subsequent password retry calls to Open() then fail with
EndOfStreamException. Reset fileEntry.Content.Position = 0 before each
retry, matching the pattern already used by RarExtractor and ZipExtractor.

Co-authored-by: gfs <98900+gfs@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 6, 2026

@copilot it doesn't intuitive eem like it should be related - but with this change i'm seeing the tests for TestDataEncrypted.7z fail with 0 results found on both the lib and cli tes...

Good catch — it's not MiniMagic but a side effect of the SharpCompress 0.44.5 upgrade. In the old version, SevenZipArchive.Open() on encrypted archives succeeded silently and detection happened later via .IsEncrypted. In 0.44.5, Open() now throws CryptographicException immediately, which leaves the stream at a non-zero position (e.g. 1775 bytes in). The subsequent password-retry Open() calls then fail with EndOfStreamException because the stream isn't rewound.

Fixed in 9444f82 by adding fileEntry.Content.Position = 0 before each retry — matching the pattern already used by RarExtractor and ZipExtractor.

@gfs
Copy link
Contributor

gfs commented Feb 6, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@gfs
Copy link
Contributor

gfs commented Feb 6, 2026

@copilot something I wonder about these new test samples for these formats - the fallback case of extact self on fail also would result in 1 file being extracted - I think maybe our expected num files tests should check to make sure that the status isn't set to failed archive. What do you think?

Three test methods (ExtractArchiveParallel, ExtractArchiveFromStreamAsync,
ExtractArchiveFromStream) only verified result count, which could mask
extraction failures for single-file archives (ARJ, ARC, ACE) since
ExtractSelfOnFail returns 1 result with FailedArchive status. Now all
three also assert that no result has FailedArchive status.

Co-authored-by: gfs <98900+gfs@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 6, 2026

@copilot something I wonder about these new test samples for these formats - the fallback case of extact self on fail also would result in 1 file being extracted - I think maybe our ex...

Great point — with ExtractSelfOnFail=true and 1-file archives, a broken extractor still returns 1 result, so the count assertion alone can't distinguish success from failure. Fixed in ac4537f by adding FailedArchive status assertions to the three test methods that previously only checked count: ExtractArchiveParallel, ExtractArchiveFromStreamAsync, and ExtractArchiveFromStream. The other methods (ExtractArchive, ExtractArchiveAsync, ExtractArchiveNoRecursion, ExtractArchiveParallelNoRecursion, ExtractArchiveSmallBatchSize) already had status verification.

@gfs
Copy link
Contributor

gfs commented Feb 6, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 14 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (5)

RecursiveExtractor/Extractors/AceExtractor.cs:136

  • The entry stream from aceReader.OpenEntryStream() is not disposed. Since FileEntry copies the stream when passthroughStream is false, dispose the entry stream immediately after constructing the FileEntry to avoid holding reader resources longer than needed.
                        try
                        {
                            governor.CheckResourceGovernor(entry.Size);
                            var stream = aceReader.OpenEntryStream();
                            var name = entry.Key?.Replace('/', Path.DirectorySeparatorChar);
                            if (string.IsNullOrEmpty(name))
                            {
                                Logger.Debug(Extractor.ENTRY_MISSING_NAME_ERROR_MESSAGE_STRING, ArchiveFileType.ACE, fileEntry.FullPath);
                                continue;
                            }
                            newFileEntry = new FileEntry(name, stream, fileEntry, false, entry.CreatedTime, entry.LastModifiedTime, entry.LastAccessedTime, memoryStreamCutoff: options.MemoryStreamCutoff);
                        }

RecursiveExtractor/Extractors/ArjExtractor.cs:136

  • The entry stream from arjReader.OpenEntryStream() is not disposed. Since FileEntry copies the stream when passthroughStream is false, the source stream can be disposed immediately after constructing the FileEntry to avoid holding resources unnecessarily.
                        {
                            var stream = arjReader.OpenEntryStream();
                            var name = entry.Key?.Replace('/', Path.DirectorySeparatorChar);
                            if (string.IsNullOrEmpty(name))
                            {
                                Logger.Debug(Extractor.ENTRY_MISSING_NAME_ERROR_MESSAGE_STRING, ArchiveFileType.ARJ, fileEntry.FullPath);
                                continue;
                            }
                            newFileEntry = new FileEntry(name, stream, fileEntry, false, entry.CreatedTime, entry.LastModifiedTime, entry.LastAccessedTime, memoryStreamCutoff: options.MemoryStreamCutoff);
                        }

RecursiveExtractor/Extractors/ArcExtractor.cs:138

  • The entry stream returned by arcReader.OpenEntryStream() is never disposed. Because FileEntry copies the contents (passthrough=false), it’s safe to dispose the entry stream immediately after constructing new FileEntry(...).
                        try
                        {
                            var stream = arcReader.OpenEntryStream();
                            var name = entry.Key?.Replace('/', Path.DirectorySeparatorChar);
                            if (string.IsNullOrEmpty(name))
                            {
                                Logger.Debug(Extractor.ENTRY_MISSING_NAME_ERROR_MESSAGE_STRING, ArchiveFileType.ARC, fileEntry.FullPath);
                                continue;
                            }
                            newFileEntry = new FileEntry(name, stream, fileEntry, false, entry.CreatedTime, entry.LastModifiedTime, entry.LastAccessedTime, memoryStreamCutoff: options.MemoryStreamCutoff);
                        }

RecursiveExtractor/MiniMagic.cs:191

  • These 'if' statements can be combined.
            if (bytesRead >= 4)
            {
                if (buffer[0] == 0x50 && buffer[1] == 0x4B && buffer[2] == 0x03 && buffer[3] == 0x04)
                {
                    return ArchiveFileType.ZIP;
                }
            }

RecursiveExtractor/MiniMagic.cs:211

  • These 'if' statements can be combined.
            if (bytesRead >= 7)
            {
                if (buffer[0] == 0x52 && buffer[1] == 0x61 && buffer[2] == 0x72 && buffer[3] == 0x21 && buffer[4] == 0x1A && buffer[5] == 0x07 && buffer[6] == 0x00)
                {
                    return ArchiveFileType.RAR;
                }
            }

… 'using' statements for better resource management
@gfs gfs requested a review from Copilot February 6, 2026 23:35
@gfs
Copy link
Contributor

gfs commented Feb 6, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 14 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (4)

RecursiveExtractor/Extractors/AceExtractor.cs:140

  • Same issue in the sync path: using var stream = aceReader.OpenEntryStream() is followed by { ... }, which won’t compile. Use a using (...) { ... } statement or a using var ...; declaration without a block.
                            using var stream = aceReader.OpenEntryStream()
                            {
                                var name = entry.Key?.Replace('/', Path.DirectorySeparatorChar);
                                if (string.IsNullOrEmpty(name))
                                {
                                    Logger.Debug(Extractor.ENTRY_MISSING_NAME_ERROR_MESSAGE_STRING, ArchiveFileType.ACE, fileEntry.FullPath);
                                    continue;
                                }
                                newFileEntry = new FileEntry(name, stream, fileEntry, false, entry.CreatedTime, entry.LastModifiedTime, entry.LastAccessedTime, memoryStreamCutoff: options.MemoryStreamCutoff);
                            }

RecursiveExtractor/Extractors/ArcExtractor.cs:142

  • Same issue in the sync path: using var stream = arcReader.OpenEntryStream() is written with a following { ... } block, which won’t compile. Use using (var stream = ...) { ... } or a using var ...; declaration without a block.
                            using var stream = arcReader.OpenEntryStream()
                            {
                                var name = entry.Key?.Replace('/', Path.DirectorySeparatorChar);
                                if (string.IsNullOrEmpty(name))
                                {
                                    Logger.Debug(Extractor.ENTRY_MISSING_NAME_ERROR_MESSAGE_STRING, ArchiveFileType.ARC, fileEntry.FullPath);
                                    continue;
                                }
                                newFileEntry = new FileEntry(name, stream, fileEntry, false, entry.CreatedTime, entry.LastModifiedTime, entry.LastAccessedTime, memoryStreamCutoff: options.MemoryStreamCutoff);
                            }

RecursiveExtractor/MiniMagic.cs:191

  • These 'if' statements can be combined.
            if (bytesRead >= 4)
            {
                if (buffer[0] == 0x50 && buffer[1] == 0x4B && buffer[2] == 0x03 && buffer[3] == 0x04)
                {
                    return ArchiveFileType.ZIP;
                }
            }

RecursiveExtractor/MiniMagic.cs:211

  • These 'if' statements can be combined.
            if (bytesRead >= 7)
            {
                if (buffer[0] == 0x52 && buffer[1] == 0x61 && buffer[2] == 0x72 && buffer[3] == 0x21 && buffer[4] == 0x1A && buffer[5] == 0x07 && buffer[6] == 0x00)
                {
                    return ArchiveFileType.RAR;
                }
            }

}

governor.CheckResourceGovernor(entry.Size);
using var entryStream = aceReader.OpenEntryStream()
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using var entryStream = aceReader.OpenEntryStream() is followed by a block ({ ... }), which is invalid syntax for a using-declaration and will not compile. Convert this to a using (...) { ... } statement or remove the braces and terminate the declaration with ;.

This issue also appears on line 131 of the same file.

Suggested change
using var entryStream = aceReader.OpenEntryStream()
using (var entryStream = aceReader.OpenEntryStream())

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +38
try
{
aceReader = AceReader.Open(fileEntry.Content, new ReaderOptions()
{
LeaveStreamOpen = true
});
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset fileEntry.Content.Position = 0 before calling AceReader.Open(...) (in both async and sync methods). This matches the established extractor pattern (e.g., Zip/GZip) and avoids failures when the stream position isn’t at the beginning.

Copilot uses AI. Check for mistakes.
var resultsList = extractor.Extract(path, stream, GetExtractorOptions()).ToList();
Assert.DoesNotContain(resultsList, r => r.EntryStatus == FileEntryStatus.FailedArchive);
Assert.Equal(expectedNumFiles, resultsList.Count);
stream.Close();
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant stream.Close() call: the stream is already disposed by the using var stream = new FileStream(...). Removing the explicit close reduces noise and avoids confusion about ownership/lifetime.

Suggested change
stream.Close();

Copilot uses AI. Check for mistakes.
continue;
}

using var entryStream = arcReader.OpenEntryStream()
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using var entryStream = arcReader.OpenEntryStream() is followed by a block ({ ... }), which is invalid syntax for a using-declaration and will not compile. Change to a using (...) { ... } statement or end the declaration with ; and remove the braces.

This issue also appears on line 133 of the same file.

Suggested change
using var entryStream = arcReader.OpenEntryStream()
using (var entryStream = arcReader.OpenEntryStream())

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +140
using var stream = arjReader.OpenEntryStream()
{
var name = entry.Key?.Replace('/', Path.DirectorySeparatorChar);
if (string.IsNullOrEmpty(name))
{
Logger.Debug(Extractor.ENTRY_MISSING_NAME_ERROR_MESSAGE_STRING, ArchiveFileType.ARJ, fileEntry.FullPath);
continue;
}
newFileEntry = new FileEntry(name, stream, fileEntry, false, entry.CreatedTime, entry.LastModifiedTime, entry.LastAccessedTime, memoryStreamCutoff: options.MemoryStreamCutoff);
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using var stream = arjReader.OpenEntryStream() is written with a following block ({ ... }), which is invalid C# syntax for a using-declaration and will not compile. Use either a using (...) { ... } statement or a using var stream = ...; declaration (and remove the braces).

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +38
try
{
arjReader = ArjReader.Open(fileEntry.Content, new ReaderOptions()
{
LeaveStreamOpen = true
});
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before calling ArjReader.Open(...), reset fileEntry.Content.Position = 0 (and similarly in the sync Extract method). Most other extractors do this to avoid failures when the stream has been read/rewound elsewhere before extraction.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +38
try
{
arcReader = ArcReader.Open(fileEntry.Content, new ReaderOptions()
{
LeaveStreamOpen = true
});
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider resetting fileEntry.Content.Position = 0 before ArcReader.Open(...) in both ExtractAsync and Extract. Other extractors do this to ensure extraction works even if the stream has been partially read before the extractor runs.

Copilot uses AI. Check for mistakes.
Comment on lines +177 to +183
if (bytesRead >= 3)
{
if (buffer[0] == 0x42 && buffer[1] == 0x5A && buffer[2] == 0x68)
{
return ArchiveFileType.GZIP;
return ArchiveFileType.BZIP2;
}
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 'if' statements can be combined.

This issue also appears in the following locations of the same file:

  • line 185
  • line 205

See below for a potential fix:

            if (bytesRead >= 3 && buffer[0] == 0x42 && buffer[1] == 0x5A && buffer[2] == 0x68)
            {
                return ArchiveFileType.BZIP2;
            }

            if (bytesRead >= 4 && buffer[0] == 0x50 && buffer[1] == 0x4B && buffer[2] == 0x03 && buffer[3] == 0x04)
            {
                return ArchiveFileType.ZIP;

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants